Please visit INSERT LINK HERE to view the workshop’s accompanying workbook
Install and load R packages (INSERT LINK HERE TO SECTION IN WORKBOOK)
(Optional) Download the workshop slides (instructions in the workbook’s Introduction)
Follow along and have fun!
Who Are We?
Marc Weber is a geographer at the Pacific Ecological Systems Division (PESD) at the United States Environmental Protection Agency (USEPA). His work supports various aspects of the USEPA’s National Aquatic Resource Surveys (NARS), which characterize the condition of waters across the United States, as he helped develop and maintains the StreamCat and LakeCat datasets. His work focuses on spatial analysis in R and Python, Geographic Information Science (GIS), aquatic ecology, remote sensing, open source science and environmental modeling.
Who Are We?
Michael Dumelle is a statistician for the United States Environmental Protection Agency (USEPA). He works primarily on facilitating the survey design and analysis of USEPA’s National Aquatic Resource Surveys (NARS), which characterize the condition of waters across the United States. His primary research interests are in spatial statistics, survey design, environmental and ecological applications, and software development.
Disclaimer
The views expressed in this workshop are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency or the U.S. National Oceanic and Atmospheric Administration. Any mention of trade names, products, or services does not imply an endorsement by the U.S. government, the U.S. Environmental Protection Agency, or the U.S. National Oceanic and Atmospheric Administration. The U.S. Environmental Protection Agency and the U.S. National Oceanic and Atmospheric Administration do not endorse any commercial products, services, or enterprises.
What Will We Cover?
Foundations of spatial data
Visualizing spatial data
Geoprocessing spatial data
Advanced applications
Focus on the R computing language
Foundations
Goals
Understand fundamental spatial data structures and libraries in R.
Become familiar with coordinate reference systems.
Geographic I/O (input/output).
Data Structures
Review data structures section in workbook
Cover vectors, matrices, arrays, lists, data frames, etc.
Cover data frame manipulation using tidyverse functions and the pipe operator (%>% or |>)
Why R for Spatial Data Analysis?
Lightweight, open-source, and cross-platform
Provides tools for automation and reproducibility
Seamlessly integrates spatial data analysis and statistical analysis in one environment
Handles vector and raster data
R Drawbacks for GIS Work
R is less interactive than desktop applications like ArcGIS or QGIS
Handling coordinate reference systems is more challenging
In-memory analysis can be prohibitive for large data
Steep learning curve
A Motivating Example
Simple, expressive code
library(tmap)library(tmaptools)library(dplyr)# find my town!my_town <- tmaptools::geocode_OSM("Corvallis OR USA", as.sf =TRUE)glimpse(my_town)
Spatial data structures are primarily organized through:
GDAL link here: For raster and feature abstraction and processing
PROJ link here: For coordinate transformations and projections
GEOS link here: For spatial operations (e.g., calculating buffers or centroids) on data with a projected coordinate reference system (CRS)
We explore these core libraries throughout the workshop
Types of Spatial Data
Vector data are comprised of points, lines, and polygons that represent discrete spatial entities (e.g., river, wtaershed, stream gauge)
Raster data divide space into small rectangles (pixels) that represent spatially continuous phenomena like elevation or precipitation
Figure 2: Vector and raster data.
Vector Data Model
Vector data are described using simple features
Simple features is a standard way to specify how two-dimensional geometries are represented, stored in and retrieved from databases, and which geometric operations can be performed
Provides framework for extending spatial POINTS to LINES and POLYGONS
Figure 3: The simple features data model.
The sf Package
We use the sf package to work with spatial vector data in R
So far we have explored plotting using ggplot(), but you can also use sf’s base plotting via plot(). Read more about plotting in sf by running ?plot.sf and then make an appropriate plot of the us_states data.
05:00
Raster Data Model
Raster data can be continuous or categorical
Can be image based; have a temporal component
Figure 9: The raster data model.
The terra Package
We use the terra package to work with spatial raster data in R
We just printed myrast and saw several components: class, dimensions, resolution, extent, and coord. ref. Explicitly return and inspect each of these pieces using class(), ncell(), res(), ext(), and crs(). Bonus: Return the number of raster cells using ncell().
05:00
Manipulating Raster Objects
Let’s add values to the raster object
values(myrast) <-1:ncell(myrast)
plot(myrast)
Manipulating Raster Objects
Figure 10: Raster plot.
Reading a Raster Object
Lets read in a raster object from the spDataLarge package that contains elevation data from Zion National Park
The authority:code identifier is the modern identifier in R, and our preferred method
Projected Coordinate Systems
Projected coordinates have been projected to two-dimensional space according to a CRS
Projected coordinates have an origin, x-axis, y-axis, and unit of measure
Conformal projections preserve shape
Equal-area projections preserve area
Equidistant projections preserve distance
Azimuthal projection preserves direction
Projected Coordinate Systems
Here an example using vector data; see the workbook for an example using raster data
library(Rspatialworkshop)data("pnw")# transform one to the otherutmz11 <-2153pnw <-st_transform(pnw, crs = utmz11)ggplot() +geom_sf(data = pnw, color="black", fill=NA) +labs(title="State Boundaries in the Pacific Northwest") +theme_bw()
Projected Coordinate Systems
Figure 13: Gages and PNW data in the same projection.
Your Turn
Re-project the pnw data to a different projected CRS. Then plot using base R or ggplot2.
06:00
A Note on S2
sf version 1.0.0 supports spherical geometry operations via its interface to Google’s S2 spherical geometry engine
S2 is an example of a Discrete Global Grid System (DGGS)
sf can run with s2 on or off and by default the S2 geometry engine is turned on:
sf::sf_use_s2()
[1] TRUE
Geographic Data I/O (Input/Ouptut)
There are several ways to read spatial data into R
Load spatial data from our machine or a remote source
Load spatial data as part of an R package
Load data using an API (which often makes use of an R package)
Convert flat files with x, y coordinates to spatial data
Geocoding data “by-hand” (we saw this earlier)
Vector Data I/O
sf can read numerous file types:
shapefiles
geodatabases
geopackages
geojson
spatial database files
Read in a Shapefile
filepath <-system.file("extdata/city_limits.shp", package ="Rspatialworkshop")citylims <-read_sf(filepath)plot(st_geometry(citylims), axes = T, main ="Oregon City Limits")
Read in a Shapefile
Figure 14: Oregon City Limits
Your Turn
Run ?read_sf() and compare read_sf() to st_read(). Our preference is to use read_sf() – why do you think that is?
03:00
Read in a Geodatabase
filepath <-system.file("extdata/StateParkBoundaries.gdb", package ="Rspatialworkshop")# List all feature classes in a file geodatabasest_layers(filepath)# Read the feature classparks <-st_read(dsn = filepath, layer="StateParkBoundaries")ggplot(parks) +geom_sf()